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Parallel tempering and population annealing are both effective methods for simulating equilibrium 
systems with rough free energy landscapes. Parallel tempering, also known as replica exchange 
Monte Carlo, is a Markov chain Monte Carlo method while population annealing is a sequential 
Monte Carlo method. Both methods overcome the exponential slowing associated with high free 
energy barriers. The convergence properties and efficiency of the two methods are compared. For 
large systems, population annealing initially converges to equilibrium more rapidly than parallel 
tempering for the same amount of computational work. However, parallel tempering converges 
exponentially and population annealing inversely in the computational work so that ultimately 
parallel tempering approaches equilibrium more rapidly than population annealing. 



I. INTRODUCTION 

Equilibrium systems with rough free energy landscapes, such as spin glasses, configurational glasses and proteins, 
are difficult to simulate using conventional Monte Carlo methods because the simulation tends to be trapped in 
metastable states and fails to explore the full configuration space. A number of techniques have been proposed to 
overcome this problem. Some of these techniques involve simulating an extended state space that includes many 
temperatures [UI2]- Multicanonical simulations, umbrella sampling, the Wang Landau method, simulated tempering 
and parallel tempering all fall into this class. Parallel tempering [SHE], also known as replica exchange Monte Carlo, 
is perhaps the most widely used of these methods because it is simple to program and performs well in many settings. 
It is the standard method for simulating spin glasses [3 |5] and is used for protein folding [5J [TU] and lattice gauge 
theory [TT] . 

Parallel tempering and the other members of its class are all Markov chain Monte Carlo methods. In Markov 
chain Monte Carlo, the target distribution is approached via repeated application of an elementary process that 
typically satisfies detailed balance with respect to the target distribution. In the case of parallel tempering, the 
target distribution is a joint distribution whose marginals are equilibrium distributions for a set of temperatures. In 
sequential Monte Carlo, by contrast, the target distribution is the last member of a sequence of distributions, each of 
which is visited once. The initial distribution is easy to equilibrate and a resampling step transforms one distribution 
to the next in the sequence. Population annealing [T2H3] is a sequential Monte Carlo algorithm in which the sequence 
of distributions are equilibrium distributions of decreasing temperature. 

In this paper we will describe both parallel tempering and population annealing and compare their efficiency and 
convergence properties in the context of a simple, tractable free energy landscape comprised of two wells separated by 
a high barrier. Although this free energy landscape is highly simplified compared to the landscapes of more realistic 
models, we believe that it captures some of the essential features of rough free energy landscapes and that the lessons 
learned from this analysis will be useful in understanding and improving the performance of both parallel tempering 
and population annealing in realistic settings. In Ref. |14j we analyzed the performance of parallel tempering for this 
landscape. 

Although they are based on quite different strategies, parallel tempering (PT) and population annealing (PA) share 
a number of common features. Both are methods that build on a conventional Markov chain Monte Carlo procedure 
whose stationary distribution is a fixed temperature equilibrium ensemble, such as the Metropolis algorithm or Glauber 
dynamics. We refer to this procedure as the equilibrating subroutine. At sufficiently high temperature the equilibrating 
subroutine converges rapidly to the equilibrium ensemble. Both PT and PA take advantage of this rapid equilibration 
at high temperature to accelerate the convergence to equilibrium at lower temperatures. Both PT and PA attempt 
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to transform equilibrium high temperature configurations into equilibrium low temperature configurations through a 
sequence of temperature steps such that the system remains close to equilibrium. In PT there is a single replica of 
the system at each temperature in the sequence, and replicas are allowed to move between temperatures via replica 
exchange. These replica exchange moves are carried out with acceptance probabilities that satisfy detailed balance so 
that the entire set of replicas tends toward equilibrium at their respective temperatures. 

Population annealing is closely related to simulated annealing [151 116j . In simulated annealing a single realization 
of the system is cooled from high to low temperature following an annealing schedule. After each temperature 
step the system is out of equilibrium and the equilibrating subroutine is used to move it back toward equilibrium. 
However, at low temperatures, the equilibrating subroutine is unable to facilitate transitions between different minima 
of the free energy landscape, and simulated annealing falls out of equilibrium if the weights associated with the free 
energy minima vary with temperature, as is typically the case. Thus simulated annealing cannot be used to sample 
equilibrium ensembles, and its primary use is to find ground states. Population annealing solves this problem by 
simultaneously cooling a population of replicas of the system through a sequence of temperatures. Each temperature 
step is associated with a resampling of the population so that some replicas are copied and other replicas are destroyed 
in such a way that the replicas are correctly weighted in the colder ensemble. In this way, at least for large populations, 
the population remains close to equilibrium as the system is cooled. The resampling step in population annealing is 
similar to methods used in diffusion Monte Carlo [17] and the "go with the winner" strategy |18) . 

Sequential Monte Carlo methods [19] . of which population annealing is an example, are not well known in statistical 
physics but have been widely applied in statistics and finance. One purpose of this paper is to bring this general 
method to the attention of computational statistical physicists. We argue that PA may have an important role to 
play in simulations of systems with rough free energy landscapes and has some advantages over parallel tempering, 
especially in situations where a moderately accurate result is required quickly and parallel computing resources are 
available. 

The outline of the paper is as follows. We describe population annealing in Sec. [n] and parallel tempering in Sec. 
Ill Section IV introduces the two-well free energy landscape, and Sec. [V] analyzes the performance of population 



annealing in this landscape. Section |VI| compares the performance of population annealing and parallel tempering, 
and the conclusions of this section are supported by numerical results presented in Sec. |VII| Section [VIII| concludes 
the paper with a discussion. 

II. POPULATION ANNEALING 

The population annealing algorithm operates on a population of R replicas of the system. For disordered spin 
systems, each replica has the same set of couplings. The algorithm consists of cooling the population of replicas 
through an annealing schedule from a high temperature, where equilibrium states are easily sampled, to a low target 
temperature, where the equilibrating subroutine cannot by itself feasibly equilibrate the system. The annealing 
schedule is defined by a set of S + 1 inverse temperatures, 

A> > fa >...,> fa. (i) 

The highest temperature, l/fts, is chosen to be a temperature for which it is easy to equilibrate the system. It is 
often convenient to choose @s = as this facilitates the calculation of the absolute free energy of the system at each 
temperature in the annealing schedule. 

In each temperature step the population is resampled and the equilibrating subroutine is applied to every replica 
at the new temperature. The first part of a temperature step from /3 to fa is resampling the population so that lower 
energy replicas are multiplied while higher energy replicas are eliminated from the population. Suppose that the 
population is in equilibrium at fa the relative weight of a replica j with energy Ej at inverse temperature fa is given 
by exp [—(fa — fa)Ej\, Thus, the expected number of copies of replica j that appear in the resampled population at 
fa is 



where Q is the normalization given by 



Vf . exp [-(fa - fajEA 
Q(fafa)= ^ 1 V[ ^ P P) Jl . (3) 



The new population of replicas is generated by resampling the original population such that the expected number of 
copies of replica j is Pj . The actual number of copies n\ , Ti2, ■ ■ ■ , of each replica in the new population is given by 
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the multinomial distribution p [R; m, . . . , ur] Pi/R, ■ ■ ■ , Pr/R] for R trials. In this implementation the population size 
is fixed. Other valid resampling methods are available. For example, the number of copies of replica j can be chosen 
as a Poisson random variable with mean proportional to pj(j3,j3'), in which case the population size fluctuates |13j . 

For large R and small (/?'— /3), the resampled distribution is close to an equilibrium ensemble at the new temperature, 
P' . However, the regions of the equilibrium distribution for P' that differ significantly from the equilibrium distribution 
for (3 are not well sampled, leading to biases in the population at /?'. In addition, due to resampling, the replicas 
are no longer independent. To mitigate both of these problem, the equilibrating subroutine is now applied. Finally, 
observables are measured by averaging over the population. 

The entire algorithm consists of S steps: in step k the temperature is lowered from fts-k+i to Ps-k via resampling 
followed by the application of the equilibrating subroutine and data collection at temperature Ps-k- 

Population annealing permits one to estimate free energy differences. If the annealing schedule begins at infinite 
temperature corresponding to f3g — 0, then it yields an estimate of the absolute free energy F(Pk) at every temperature 
in the annealing schedule. The following calculation shows that the normalization factor Q(f3,f3') is an estimator of 
the ratio of the partition functions at the two temperatures: 



Z(P') _ E n 



-03'-/3)E 7 



3=1 

The summation over 7 is a sum over the microstates of the system while the sum over j is a sum over the population 
of replicas in PA. The last approximate equality becomes exact in the limit R — > 00. From Eq. [4] the estimated free 
energy difference from /3 to P' is found to be 

-P'F(p') = -pF(p)+logQ(p,p'), (5) 

where F((3) is the free energy at /3 and F is the estimated free energy at P' . Given these free energy differences, if 
Ps = 0j then the PA estimator of the absolute free energy at each simulated temperature is 

s 

-p k F(p k )= logQC^ft-iJ + logfi, (6) 

l=k+l 

where fl = 1 is the total number of microstates of the system; i.e. , ks logfi is the infinite temperature entropy. 

III. PARALLEL TEMPERING 

Parallel tempering, also known as replica exchange Monte Carlo, simultaneously equilibrates a set of R replicas of 
a system at S inverse temperatures 

Pa >px > ...,p s -i- (7) 

There is one replica at each temperature so that R = S in contrast to population annealing, where typically the 
number R of replicas greatly exceeds the number S of temperatures; i.e. , R ^> S. The equilibrating subroutine 
operates on each replica at its respective temperature. Replica exchange moves are implemented that allow replicas 
to diffuse in temperature space. The first step in a replica exchange move is to propose a pair of replicas (fc, k — 1) at 
neighboring temperatures p = Pk and P' = Pk—i- The probability for accepting the replica exchange move is 



Pswap 



(8) 



Here E and E' are the respective energies of the replicas that were originally at P and P' . If the move is accepted, 
the replica equilibrating at p is now set to equilibrate at P' and vice versa. Equation [8] insures detailed balance so 
that the Markov chain defined by parallel tempering converges to a joint distribution whose marginals are equilibrium 
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distributions at the S temperatures of Eq. [7J Diffusion of replicas in temperature space allows round trips from low 
to high temperature and back. The benefit of these roundtrips is that free energy barriers are crossed in a time that 
grows as a power of the barrier height P3] rather than exponentially with respect to the barrier height as is the 
case for most single temperature dynamics. Optimization schemes for PT depend in part on adjusting parameters to 
maximize the rate of making roundtrips 21) 2"2"1 . 



IV. TWO- WELL MODEL FREE ENERGY LANDSCAPE 



In this section we describe a simple free energy landscape with two minima such as occurs, for example, in the low 
temperature phase of the Ising model or c^ 4 field theories. This free energy was introduced in Ref. |14| in the context 
of analyzing the efficiency of parallel tempering. For ft > ft c and ft c a critical temperature, the free energy F a (ft) 
associated with each well is defined by 



where 



ftF r7 (ft) = -\(ft-ftc) 2 A rT , (9) 

2 _(K + H/2 ifa=l 
Act - \ K - H/2 if a = 1 Uj 

and a labels the well. The deep well corresponds to a = 1 and the shallow well to a — 0. The well index a is the 
only macroscopic parameter in the model and the "landscape" is zero dimensional. However, we also assume that the 
free energy at the saddle point between the wells is zero so that F is the free energy barrier between the wells. The 
landscape is flat at ft = ft c . The parameter if is a proxy for system size. In more realistic systems, barrier heights 
typically grow as a power of the number of degrees of freedom N of the system. 

The statistics of the energy of microstates in each well follows from this free energy using thermodynamics. The 
internal energy U a (ft) is the average of the energy distribution in well a and is obtained from 

U„(ft) = ^ = -(ft-ft c )Al. (11) 

Using the relationship between specific heat and energy fluctuations, we find that the variance of the energy in well 
a is simply A 2 . 

The free energy also determines the probability c{ft) of being in the deep well according to 

c ^ = E W = 1 + e -w - ( 12 ) 

Microstates of the model are specified by an energy and a well index. We assume that in equilibrium, the distribution 
of energies, conditioned on the well index <r, is a normal distribution with mean U a (ft) and variance A 2 . Thus, the 
well index a is a Bernoulli random variable such that a = 1 with probability c(ft) and a — with probability 1 — c(ft). 
The energy E is given by 

/ J\T(0iG9),A?) if cr = 1, 
*-\N(U (ft),A 2 ) i£a = 0, 

where N(fi, A 2 ) is a normal random variable with mean and variance A 2 . 

The dynamics of the model under the equilibrating subroutine for ft < ft c is assumed to have the following properties. 
The well index is conserved except at the critical temperature, ft c . That is, there are no transitions between the wells 
except for ft = ft c . On the other hand, for ft > ft c the equilibrating subroutine is assumed to equilibrate the system 
within each well in a single time unit. Thus, the sequence of energies produced by successive steps of the equilibrating 
subroutine will be i.i.d. normal random variables N(U a (ft), A 2 ), where a is the well index. For ft = ft c the equilibrating 
subroutine first chooses one of the wells with equal probability and then chooses the energy from the associated normal 
distribution. 



V. CONVERGENCE OF POPULATION ANNEALING 



We first consider a single step of population annealing from inverse temperature ft to inverse temperature ft' > ft. 
We will compute the error made by population annealing in the free energy and the fraction of the population in the 
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deep well as a function of the number of replicas, the size of the temperature step [}' — 0, and the parameters /3 C , K 
and H of the two- well model. Let 

F,=exp [-(/?' -flEj+toj], (14) 

where Ej is the energy and <jj is the well index of replica j. Setting A = yields the un-normalized re- weighting 
factor (see Eq. [2| of replica j from inverse temperature (3 to /3'. The extra term Xaj in the exponent will be used to 
calculate the probability of being in the deep well. 

To obtain the error in the free energy and the fraction of replicas in the deep well at temperature /?' assuming the 
correct equilibrium distribution at /?, we compute i(/3,/3', A), the expectation of the logarithm of Y: 

R 

W,A)=Elog(-^). (15) 

3=1 

Using Eq. [4j we obtain the PA estimate of the free energy difference by setting A = in I(/3, /?', A): 

P'F{P') - pF[fl) = -1(0, P,0) = E(logQ(/3',/3)). (16) 

In this equation, j3'F(j3') is the estimate of the free energy at f3' given the exact value at (3. The fraction in the deep 
well at the lower temperature is obtained by differentiation with respect to A: 



. (/3 , ) = d W ',A) 



dX 



(17) 

A=0 



Our goal is to determine how much these estimates deviate from the corresponding exact values. 

Let S n be a sum of n independent, identically distributed random variables Xj. Generically, one can use Taylor's 
Theorem to prove that the leading terms in an asymptotic expansion of the expectation of a function f(S n /n) have 
the form 

Ef(SJn) = E/QT X 3 /n) = f(EX) + i-/"(EX)VarX + O f^) , (18) 
j= i n \ n J 

where E(X) and Yar(X) denote the expectation and variance of Xj, respectively. Thus, for our case, 

1 VarY / 1 \ 

^/?',A)=log(Ey)-- w + o(^. (19) 

The first term is the exact result and the second term is the leading order systematic error in the population annealing 
estimate due to a finite population size. Setting A = 0, we have 

<"W-»-^ + o(^) (20) 

This result shows that the systematic error decreases as the inverse of the population size and that the free energy 
approaches the exact value from above as the number of replicas increases. 

The variance of the free energy estimator was observed to be a useful measure of the convergence of the algo- 
rithm [13] . Here we formalize that observation by computing the variance of f3'F(f3') presuming that f3F((3) is exactly 
known and thus has no variance. The variance of the free energy estimator is given by, 



Var(/3'F(/3')) = E^log 2 ifj 1 



-{fi'-P)E i 




(21) 



Applying Eq. 18 to both log and log 2 and substituting the results into Eq. 21 yields 



1 ~VarY f 1 \ 

Var <^ ( «=S<^+°(^> (22) 

where A is set to zero on the RHS of this equation. Comparing Eqs. [22] and [20} we find that 

pFtf) - W) = \ Var(/3'F(/3')) + O \ . (23) 
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This equation is useful because Var (j3' F(j3')) can be directly estimated from multiple runs of PA, and thus the 
accuracy of the algorithm as applied to a specific system can be estimated from the algorithm itself. Although Eq. 
[23] was derived for a single step of PA for the two- well model, the calculation does not rely on specific features of the 
model. Furthermore, since the variance is additive we conjecture that Eq. [23] is a good approximation for the full PA 



algorithm applied to any statistical mechanical system. In support of this conjecture, we note that Eq. 23 is a good 
approximation for the PA estimate of the low temperature free energy of the one-dimensional Ising spin glass studied 
in |14j for which the exact free energy can be calculated using transfer matrix methods. 

For the case of the two-well model, we can evaluate the Gaussian integrals exactly for both the mean and variance 
of the weight factor Y. The general result is 

1(0, F, A) = f [(/?' - (3 C ) 2 -{fi- f3 c ) 2 } (24) 



■ log cosh 



03'-/?c) 2 f + A/2 



log cosh 



A/2 



+0 



1 L ex \( B > 3) 2 K ] ( C08h ill ~ &) 2g / 4 ] cosh W M 2 H/4 + A] 

2R 2R° XP[[P P) H cosh [( / 3'- / 3 c ) 2 ff/4 + A/2] 2 

1 



From this general result it is instructive to consider expansions to first order in H. The estimate of the free energy 
difference is 

B'F(B') - W) - - 1 [(/?' - Be) 2 - [fi - Be) 2 ] (25) 



The first term on the RHS of this expression is the exact free energy difference in the two-well model for symmetric 
wells. The second term is the error made by population annealing, which decreases inversely in R. The form of the 
error term also reveals that the size of the temperature steps should be (/?' — B) < 1/yK to keep the error under 
control as the barrier height increases. 

The error in the free energy estimate at Bo for small H can be obtained by summing the errors made in each 
temperature step of the algorithm. Since the errors depend only on the size of the temperature step, the algorithm is 
optimized with constant size steps in 0. Thus the error estimate at Bo is 

ftF(A) - 0oF(Bo) = ^ (exp [(Bo - Be) 2 K/S 2 ] - l) + O L^-\ + 0(H 2 ) . (26) 

Next we consider the fraction of the population in the deep well at temperature 0' given the equilibrium value at 
B- Combining Eq. |17| and |24| and expanding to leading order in H, we obtain 

cOT = i + ^£ (27) 



H 

~8R 



(0' - B)(W -P- m exp [(/?' - B?K] + O + 0(H 2 ) 



The first two terms on the RHS of this expression are the leading order in H expansion of the exact value, Eq. [12] 
The correction term shows that again the temperature steps should satisfy (/?' — /?) < 1/ \K. 

From Eq. [27] we could obtain an estimate of the overall error in the fraction in the deep well by summing over the 
S temperature steps. Unfortunately the result significantly underestimates the true error. The reason is that the 
resampling step introduces correlations between replicas so that the probability distribution for the number of replicas 
in the deep well has a variance that is broader than that of the binomial distribution assumed in the above analysis. 
Nonetheless, we conjecture that the leading term in the error made by the full algorithm in the fraction in the deep 
well, (c(Bo) — c(Bo)), behaves as 1/R and can be minimized when S ~ \[K. The effect of correlations is much less 
important for the free energy, as evidenced by the absence of a term that is order H in Eq. |25[ and we conjecture 
that Eq. [26] is exact to leading order. We are currently studying these questions. 

The two main conclusion from this analysis are that (1) the error decreases inversely with the number of replicas 
and (2) the error can be made small only if the temperature step size satisfies (/?' — B) % 
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VI. PARALLEL TEMPERING VS POPULATION ANNEALING 

How do PT and PA compare in the efficiency with which they converge to equilibrium? Here we estimate the 
amount of computational work needed to make the deviation from equilibrium small. The quantities that we increase, 
holding other parameters fixed, are the number of replicas R for PA and the number of sweeps t for PT. Within the 
stylized two-well model, we define computational work W as the total number of times replicas are acted on by the 
equilibrating subroutine. For PA, with R replicas and S temperature steps the work is W = RS. For PT with R 
replicas the computational work is given by W = Rt where t is the number of PT sweeps. 

Our measure of computational work ignores the time required to resample the population in PA or implement 
replica exchange in PT. For large systems, this time is negligible compared to the time spent in the equilibrating 
subroutine. The quantity W assigns one unit of time for one sweep of the equilibrating subroutine for a single replica; 
thus the computational work measured in elementary operations rather than sweeps of the equilibrating subroutine 
is iVW, where N is the number of degrees of freedom of the system. Since N is the same for PA and PT and not 
explicitly defined in the two-well model, we don't consider this factor explicitly. 

Suppose we carry out the simulations on a massively parallel computer and consider parallel time instead of 
sequential time (work). Since each replica can be independently acted on by the equilibrating subroutine, one parallel 
time unit is required for one sweep of all the replicas. Thus, for the highly parallel PA, the parallel time is the number 
of temperature steps S whereas for the less parallel PT, the parallel time is the number of PT sweeps t. 

In [14] we analyzed the efficiency of PT for the two-well model. As is generally the case for Markov chain Monte 
Carlo methods, convergence to equilibrium is asymptotically exponential. The deviation from equilibrium is controlled 
by an exponential autocorrelation time r oxp , and its leading behavior is proportional to exp(— i/r cxp ), where t is the 
number of Monte Carlo sweeps. In the two-well model, for small asymmetries between the wells (i.e., (/3q — f3 c ) 2 H < 1), 
T exp is controlled by the diffusive time scale for a replica to diffuse between the lowest and highest temperature. If the 
number of temperatures is sufficiently large, then the acceptance fraction for replica exchange is not small and the 
elementary time step in this diffusive process is order unity so that r oxp ~ R 2 . The optimum number of replicas was 



shown to scale asR = S+ l~vK and given this choice, r exp ~ R ~ K. When the asymmetry becomes large, the 
optimum number of temperatures remains approximately the same but there is a crossover to a ballistic regime and 
T cxp ~ R r~j \[K. Convergence to equilibrium occurs on a time scale r exp so that, in the small-asymmetry diffusive 
regime, the work Wo required to begin to achieve moderately accurate results is given by Wo ~ RTcxp ~ K 3 / 2 . 
For PA we found in Sec. [vj Eq. 24 that the error term depends on K as exp [(/?' — f3) 2 K~\ so that the optimum 



number of temperature steps scales as S ~ v K, just as is the case for PT. Based on the form of Eq. 27 we conjecture 
that the overall error behaves as S a /R with a an exponent less than or equal to unity. Thus, the error decreases as 
S 1+a /W ~ /^( 1 + a )/ 2 /W, and the computational work Wo required to begin to achieve moderately accurate results 
behaves as Wo ~ K^ 1+a ^ 2 . For large systems (large K) and nearly degenerate free energy minima, population 
annealing is expected to be more efficient initially than parallel tempering by a factor of a power of the barrier 
height, K x ~ a l 2 . However, for large amounts of computational work (i.e., W ^> Rr cxp ~ K z / 2 ) PT is much closer to 
equilibrium than PA because PT converges exponentially in t while PA converges inversely in the comparable variable 
R. 

VII. NUMERICAL RESULTS 

We have carried out simulations of the two-well model using both PT and PA to compare the efficiency of the 
algorithms and test the conjectures of the previous section. In these simulations, the equilibrating subroutine samples 
a Gaussian random number with mean and variance appropriate to the temperature and well-index of the replica, 
Eq. [l3j The well index is a conserved quantity except at f3 c . At f3 c the equilibrating subroutine first chooses the well 
index with equal probability and then chooses the energy according to Eq. [13] with j3 — j3 c and the given value of a. 

Figures l(a)[ 1(b) and 1(c) show the convergence to equilibrium of 7, the deviation from equilibrium of the proba- 



bility of being in the deep well at the lowest temperature given by 

7 = c(/?o)-c(A)), (28) 

as a function the number of sweeps t for PT (small blue points) or populations size R for PA (large red points). 
The horizontal axis therefore measures computational work for both algorithms in the same units. The figures differ 
according to th e valu e of the well depth K and num ber of temperatures S with K = 16, S = 11 for Fig. |l(a)| K — 64, 
S = 23 for Fig. 1(b) | and K — 256, S = 47 for Fig. 1(c) In each case, there is a small asymmetry, H — 0.1 and at 



the lowest temperature c(/3q) — 0.68997. The highest and lowest temperatures are (3 C = 1 and (3q — 5, respectively, 
and the number of temperatures S + 1 is chosen to be close to the optimum value for PT and to scale as yK. The 
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FIG. 1: The deviation of the fraction in the deep well from the equilibrium value 7 vs. Monte Carlo sweeps t for PT (small 
blue points) or population size 7? for PA (large red points) for the values of K and S shown in the subcaptions. For each case 
H = 0.1. 



simulations confirm the conclusions of Sec. VI and show that for larger systems (larger K) PA is initially closer to 
equilibrium than PT for the same amount of computational work. 

We have also tested two conjectures concerning the convergence of PA for the two-well model. The first conjecture 
is that the error in the fraction in the deep well decreases inversely in the population size. Figure [2] shows Rj as a 
function of R for the case H = 0.1, K = 64, and S = 23. It is clear that within the error bars 7 is behaving as 1/R 
over the range of R studied. The averages and error bars are obtained from 10 5 independent runs of PA for each 
population size. 

The second conjecture is that for large R and S ~ vK 



i? 7 - K a/2 . 



(29) 



Figure [3] is a log-log plot of R-y vs. K. For each value of K, PA is run with a large population size R = 10000, 
H = 0.1, and the values of S given above satisfying S ~ \f~K. Averages and errors are obtained from 10 5 independent 
runs. The best fit to Eq. [29] is shown as the solid line and yields a = 0.85 ± 0.08. Note that a < 1 as conjectured, 
supporting the hypothesis that for modest amounts of computational work, PA yields a smaller error in 7 than PT 
and that this advantage increases with well-depth parameter K. 



VIII. DISCUSSION 



We have seen that both parallel tempering and population annealing are able to solve the problem of sampling 
equilibrium states of systems having several minima in the free energy landscape separated by high barriers. Parallel 
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FIG. 2: R-y vs. R for K = 64, S = 23 and H = 0.1. 
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FIG. 3: i?7 vs. K for R = 10000 and H = 0.1. The solid line is the best power-law fit, i?7 = if 



tempering is a Markov chain Monte Carlo method and thus converges to equilibrium exponentially in the number 
of sweeps, whereas population annealing converges inversely in the population size. Their relative efficiencies are 
described qualitatively by the parable of the tortoise and the hare. For a given amount of computational work, the 
hare (population annealing) is initially closer to equilibrium but ultimately the tortoise (parallel tempering) catches 
up and gets ahead. 

The problem of high barriers between different free energy minima is only one of the difficulties encountered in 
simulating systems with rough free energy landscapes. A second, generic problem is that the relevant free energy 
minima may have small basins of attraction so that when the system is annealed starting from high temperature, 
it is very unlikely that the relevant low temperature states will be found. Small basins of attraction occur for first 
order transitions and for NP-hard combinatorial optimization problems and are almost certainly a feature of the low 
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temperature phase of spin glasses. In this situation a very large number of sweeps of parallel tempering or a very 
large population size in population annealing are required simply to find the relevant states. 

Parallel tempering is widely used in several areas of computational physics while population annealing is not well 
known. One of the conclusions of this paper is that population annealing is an attractive alternative to parallel 
tempering for studies where moderately accurate answers are required quickly. This is especially the case if massively 
parallel computing resources are available since population annealing is well suited to parallelization. 

Acknowledgments 

Jon Machta was supported in part from NSF grant DMR-0907235 and Richard Ellis by NSF grant DMS-0604071. 



[1] Y. Okamoto, Journal of Molecular Graphics and Modelling 22, 425 (2004). 
[2] W. Nadler and U. H. E. Hansmann, Phys. Rev. E 75, 026109 (2007). 
[3] R. H. Swendsen and J.-S. Wang, Phys. Rev. Lett. 57, 2607 (1986). 

[4] C. Geyer, in Computing Science and Statistics: 23rd Symposium on the Interface, edited by E. M. Keramidas (Interface 

Foundation, Fairfax Station, 1991), p. 156. 
[5] K. Hukushima and K. Nemoto, J. Phys. Soc. Jpn. 65, 1604 (1996). 
[6] D. J. Earl and M. W. Deem, Phys. Chem. Chem. Phys. 7, 3910 (2005). 
[7] H. G. Katzgraber, M. Korner, and A. P. Young, Phys. Rev. B 73, 224432 (2006). 

[8] R. A. Banos, A. Cruz, L. A. Fernandez, J. M. Gil-Narvion, A. Gordillo- Guerrero, M. Guidetti, A. Maiorano, F. Mantovani, 

E. Marinari, V. Martin-Mayor, et al., Journal of Statistical Mechanics: Theory and Experiment 2010, P06026 (2010). 
[9] U. H. E. Hansmann, Chem. Phys. Lett. 281, 140 (1997). 
[10] A. Schug, T. Herges, A. Verma, and W. Wenzel, Journal of Physics: Condensed Matter 17, S1641 (2005). 
[11] G. Burgio, M. Fuhrmann, W. Kerler, and M. Muller-Preussker, Physical Review D 75, 014504 (2007). 
[12] K. Hukushima and Y. Iba, in THE MONTE CARLO METHOD IN THE PHYSICAL SCIENCES: Celebrating the 50th 

Anniversary of the Metropolis Algorithm, edited by J. E. Gubernatis (AIP, 2003), vol. 690, pp. 200-206. 
[13] J. Machta, Phys. Rev. E 82, 026704 (2010). 
[14] J. Machta, Phys. Rev. E 80, 056706 (2009). 

[15] S. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi, Science 220, 671 (1983). 

[16] P. van Laarhoven and E. Aarts, Simulated Annealing: Theory and Applications (D. Reidel publishing company, 1987). 

[17] J. B. Anderson, J. Chem. Phys. 63, 1499 (1975). 

[18] P. Grassberger, Computer Physics Communications 147, 64 (2002). 

[19] A. Doucet, N. de Freitas, and N. Gordon, eds., Sequential Monte Carlo Methods in Practice (Springer- Verlag, New York, 
2001). 

[20] H. G. Katzgraber, S. Trebst, D. A. Huse, and M. Troyer, J. Stat. Mech. 03018 (2006). 
[21] S. Trebst, M. Troyer, and U. H. E. Hansmann, J. Chem. Phys. 124, 174903 (2006). 
[22] E. Bittner, A. Nufibaumer, and W. Janke, Phys. Rev. Lett. 101, 130603 (2008). 



